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ABSTRACT 

This paper is an investigation of a g'oodness of fit 
test for bivariate normal distributions. The test pro- 
cedure is based on random linear functions of bivariate 
normal random variables. The test makes use of the maximum 
Kolmogorov D(M) statistic over the linear functions which 
are computed. An estimate of the distribution of M is 
obtained by computer simulation. No attempt is made to 
determine the power of the test. 
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I . INTRODUCTION 



An important application of Statistics is to attempt 
to find a specific probability distribution which fits an 
observed sample of data. A well fitting distribution can 
then be used to predict values of future occurrences, 
relative frequencies of future occurrences, etc. There 
are numerous methods available for testing the goodness 
of fit of data in scalar form to a hypothesized univariate 
probability distribution. Among these are the Chi-square 
and the Kolmogorov-Smirnov tests, of which the Kolmogorov- 
Smirnov test is considered the more powerful [1] . Further- 
more the Kolmogorov-Smirnov test exhibits the very 
attractive characteristic that it is based on a statistic 
which has a distribution of the random variable being 
sampled . 

However, there appears to be a lack of methods for 
testing the fit of hypothesized multivariate distributions 
to multivariate data (observations in vector form). 
Furthermore, no statistic which has the desirable character- 
istic of being distribution free and which can be used in 
multivariate goodness of fit tests has been found. In 
fact, no such statistic may even exist. For instance, 
Simpson [2] has shown an example of continuous bivariate 
distributions for which the analog of the Kolmogorov- 
Smirnov statistic is dependent on the underlying distribution. 



7 



Rosenblatt [3] discusses a possible test which involves 
a transformation of an absolutely continuous k-variate 
distribution into the uniform distribution on the 
k-dimensional hypercube. The transformation is uniquely 
determined by the theoretical distribution against which 
the sample is to be tested. Then the transformed sample 
may be tested against the uniform distribution in 
k-dimensions . There are several disadvantages to this 
procedure, however. For example, the results are influenced 
by the manner in which the components of the observed 
vectors are ordered. 

The purpose of this paper is to describe a goodness of 
fit test for testing a bivariate normal distribution (with 
given mean and covariance matrix) against samples of bi- 
component data. The test results in acceptance or rejection 
of the hypothesis that F N = F, where F N is the cumulative 
distribution function of the population of the bivariate 
sample vectors and F is the hypothesized cumulative distri- 
bution function. The notation in this paper closely follows 
the notation used by Anderson [4]. It is expected that the 
test developed here for the bivariate normal distribution 
can be extended to the case of the k-variate normal distri- 
bution. This paper is restricted to a consideration of 
testing the fit of samples to a distribution which has 
zero mean. This restriction causes no loss of generality 
since any distribution with a finite mean can be translated 
to mean zero by a linear transformation. 



8 



The goodness of fit test was developed according to the 
following procedures: 

1) a characterization of the bivariate normal distri- 
bution is used to develop a test statistic M for use in a 
goodness of fit test, 

2) the distributional properties of M are investigated 
by computer simulation. 
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II. THEORY 



Since there appeared to be no widely known statistic 
for a reasonable goodness of fit test for multivariate 
distributions in general, and the multivariate normal 
distribution in particular, it seemed plausible that a 
statistic suitable for a goodness of fit test might be 
found by considering characterizations of the multivariate 
normal distribution. 

One property which characterizes a multivariate normal 
distribution is given in the following theorem [4]: 

Theorem 1. A p-dimensional random variable X has a 
p-variate normal distribution, if an only if every 
linear function of X has a univariate normal distri- 
bution . 

The parameters of the univariate normal distribution can be 
computed according to theorem 2. 

Theorem 2. Let X (a column vector with p components) 
be distributed according to N (y,E) a multivariate 

.c-' 

normal distribution with mean (vector) y and covari- 
ance (matrix) £, and let C be a row vector of p 
constants. Then 

Y = CX 

is distributed as univariate normal with mean Cy and 
variance CEC' (C' is the transpose of C). 
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(NOTE: CX can be described as a linear combination of the 

components of X.) 

From the characterization of the multivariate normal 
distribution given in theorem 1, it was felt that a suitable 
goodness of fit test procedure might be to test the result 
of a linear combination of the sample vectors (whose distri- 
bution had been hypothesized as a specific multivariate 
normal distribution) against the hypothesized theoretical 
univariate normal distribution which has been computed for 
the particular linear combination. Thus the problem is 
reduced to the univariate level and use can be made of well 
known univariate statistics which provide acceptable good- 
ness of fit tests. 

However, theorem 1 states that every linear combination 
of multivariate normal random variables must be univariate 
normal. Obviously one linear combination will not suffice 
for a reasonable test. It is not difficult to envision 
that there exists some linear function of nearly any vector 
sample which will transform that vector sample into one 
which is accepted as univariate normal. In fact, if the 
marginal distribution of the components are univariate 
normal, but the joint distribution is not multivariate 
normal, the linear combination consisting of one component 
(e.g. Y = X^+0X2 ••• + OX n =X^) is univariate normal. Thus 
a test which uses only one linear combination might be 
manipulated by the tester to give any results he desires. 
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On the other hand, it is clearly impossible to compute 
every linear combination of a sample. As a compromise, it 
was felt that a number (to be determined) of randomly 
selected linear combinations would serve as a representa- 
tive sample upon which an overall test statistic might be 
based. To produce random linear combinations, the (column) 
sample vectors were multiplied by a (row) vector of random 
constants. The random components of the 'multiplying 
vectors' were drawn from the uniform (0,1) distribution. 

A uniform (0,1) distribution for the random multipliers 
was used because: 

1) Up to multiplicative constants, essentially any 
linear combination of the components of the multivariate 
vector could be produced using coefficients from the uniform 
(0,1) distribution, and 

2) A component of a random multiplier was equally 
likely to be contained in any one interval in (0,1) as in 
any other interval, provided the intervals were of equal 
length. Thus there should be no specific interval contain- 
ing a 'concentration' of the multipliers which might 
adversely influence the performance of the goodness of fit 
test . 

NOTE: The results of the goodness of fit test described 

in this paper using random multipliers from a uniform (0,1) 
distribution were the same as results obtained using random 
multipliers from a uniform (-2,2) distribution. 
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The Kolmogorov-Smirnov test was employed to determine 
acceptance (or rejection) of the hypothesis that the 
linear combinations of bivariate sample vectors are from 
the (computed) theoretical univariate distributions noted 
in theorem 2. As noted previously, the Kolmogorov-Smirnov 
test is considered more powerful than the Chi-square test. 

Of course, the distribution free characteristic of the 
Kolmogorov D statistic applies in particular to linear 
combinations of the components of multivariate normal 
random variables. A description of the Kolmogorov- 
Smirnov test is presented in the following paragraphs. 

One method of testing the simple hypothesis, H: F^ = F, 
where F^ is the cumulative distribution of the population 
sampled and F is the theoretical continuous distribution 
proposed for the population, is the Kolmogorov D statistic 
[5] . The asymptotic distribution of D was investigated by 
Kolmogorov and tabulated by Smirnov [6] and, for small 
sample sizes, by Massey [7]. 

The Kolmogorov D statistic is derived from the sample 
cumulative distribution function, S^, and the proposed 
theoretical cumulative distribution function, F, as follows; 

Let Y^,...Yjj be a random sample from a continuous population 
with cumulative distribution function F. Let Z^,...Z^ be 
the ordered statistics of Y, so that 

- ro < Z^ £ 7*2 £ • • • 1. < 00 
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The sample cumulative distribution, then, is 



S N (x) = j/N 



1 



0 




N-l 



The Kolmogorov D statistic is defined as 



D = Sup | S N (x) - F(x) | . 



x 



and can be described roughly as the maximum deviation of 
the sample cumulative distribution function from the pro- 
posed theoretical cumulative distribution function. The 
D statistic has the property that its distribution does 
not depend upon the underlying distribution F. Clearly, 
it is dependent on the sample size N, because the sample 
cumulative distribution functions, S^, takes as values only 
multiples of 1/N. Naturally the D statistic approaches 
zero, almost surely, as N becomes large without limit, pro- 
viding the sample is actually from a population with 
distribution F. Critical values, T, of the D statistic are 
obtained from the tabulated distributions and are used with 
a sample to determine acceptance or rejection (reject if 
D >_ T) of the hypothesis that F N = F . 

One value of the Kolmogorov D statistic is derived from 
each linear combination of a bivariate sample of vectors. 
For example, let X = (X-^,...X^) where X^ = (x^jX^g) an( ^ 
is a scalar, be a random sample of size N of bivariate 



random vectors. A linear combination Y = CX, where 
C = (0^,02) and c^ is a scalar, is a vector of N scalars, 
(Y^,...Yjj). Let Z be the hypothesized covariance matrix 
of the distribution of X. To obtain a rough test of the 
hypothesis that the distribution of X is bivariate normal 
with covariance matrix £ (and mean zero), one may test the 
hypothesis that Y = CX is distributed as univariate normal 
with variance CZC' (and mean zero). A Kolmogorov-Smirnov 
test to determine the acceptance of the hypothesis when 
one particular value of C, say C^, is used to compute a 
linear combination will yield one value of D, that is 

D = Sup | F^(y) - S*(y ) | 

y 

where F^- is the hypothesized univariate normal cumulative 

% 

distribution of Y and S N is the sample cumulative distribu- 
tion of the transformed sample. When the procedure listed 

above is repeated for a different value of C, say C . , then 

J 

another value of D, which may or may not be identical to 
the first value, is obtained. If every value of D obtained 
with various values of C is less than the critical value of 
D for the given sample size and level, then the hypothesis 
is accepted. Likewise if every value of D obtained from 
using various values of C is greater than the critical 
value, then the hypothesis is rejected. However, some 
linear combinations of typical samples from a bivariate 
normal population can be expected to give values of D 
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exceeding the critical value while other linear combinations 
may give values of D less than the critical value. To 
eliminate the ambiguity of such results, another statistic 
must be used, preferably one whose distribution function 
can be readily tabulated or computed. For this purpose we 
use the maximum of the D statistics, M, derived from 
Kolmogorov-Smirnov tests of a large number, m, of linear 
combinations of the sample. That is, 

M = max sup \S„ j (y) - F?(y)| i = 1,2,. ..m 

i y i 

where ^ is the sample cumulative distribution function 
i 

of the linear combination C^X and F^(y) is the hypothesized 
theoretical cumulative distribution of the linear combina- 
tion C ± X. 

The values of the D statistics derived from linear com- 
binations of a particular sample appear to be more highly 
dependent on the sample than on the random multiplying 
vectors (see Section III, Empirical Results). Therefore 
the maximum D might be expected to be a result of only the 
sample so that rejection or acceptance of the hypothesis 
H: F^ = F, where F^ is the cumulative distribution function 
of the population sampled, and F is the proposed theoretical 
cumulative distribution function, would depend only on the 
bivariate sample. 

The distribution of M, as defined above, appeared to 
be intractable to get in closed mathematical form. However, 
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the properties of the distribution of M were investigated 
for several cases by examining empirical data obtained from 
computer simulation. The data was produced by generating 
samples of a specified bivariate normal distribution, com- 
puting random linear combinations of the sample vectors, 
and recording the maximum of the resulting Kolmogorov D 
statistics. This procedure was repeated to give several 
lists of 500 values of M. Each list of 500 values of M 
was derived from bivariate sample vectors with different 
underlying bivariate normal distributions. The empirical 
results and generating techniques are discussed in Sections 
III and IV. 
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III. EMPIRICAL RESULTS 



In order to obtain the empirical data to study the 
distribution of M, the maximum D statistic, a computer 
program was written to accomplish the following for 
specific selections of the covariance matrix, E : 

1. Generate a sample of desired size of bivariate 
normal random vectors from the given distribution, 

2. Generate the desired number of multiplying vectors, 
each of which would produce one linear combination of the 
sample vectors, 

3. Compute the linear combinations of the sample 
vectors by vector multiplication, 

4 . Perform a Kolmogorov-Smirnov test on each univariate 
sample obtained as a result of a linear combination and 
record the resulting value of D. The values of 

D = sup | S N (x) - F(x) | 
x 

when and P are functions previously defined, were 
determined at values of .OIK (K = 1,...100) for the proposed 
theoretical distribution P. For example, the value of the 
sample cumulative distribution, S^(x), was evaluated at each 
point x^ where F(x.) was multiple of .01, D being assumed 
to be the maximum value of the 100 differences 



I S N ( x ± ) - F(x 1 ) | 
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5. Record the maximum value, over the linear combina- 
tions performed on each sample, of the D statistics pro- 
duced by each particular sample. 

The initial simulation procedure generated sets of 100 
random vectors from a bivariate normal distribution with 
covariance matrix E. One hundred 'random' linear combina- 
tions of each set of vectors were computed and the D 
Statistic derived from each linear combination was recorded. 
The results of this simulation indicated that the D sta- 
tistics for a given sample were grouped within an interval 
approximately .05 units in length, but the location of the 
interval was dependent upon the particular sample. This 
phenomenon suggested that the value of D is dependent on 
the sample to a higher degree than it is on the linear 
function used. The results of five such simulations for 
samples from each of two different bivariate normal distri- 
butions are summarized in Table I. Note that with sample 
number lb, several D values exceeded the univariate 
Kolmogorov-Smirnov critical value at the .05 level of sig- 
nificance. Thus, using the univariate Kolmogorov-Smirnov 
critical value, the hypothesis that the sample was from the 
underlying distribution from which it was generated would 
have been rejected for some linear combinations and 
accepted for others. But using a critical value (determined 
by level of significance and sample size) for maximum D 
would have eliminated the ambiguity. 

It was also found that the relative frequency, within 
each interval of length .01, of the D statistics remained 
nearly constant as the number of linear combinations was varied. 
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The data produced by the simulation procedure described 
above led to the consideration of using the maximum D 
Statistic for testing a bivariate sample against the pro- 
posed bivariate normal distribution. This simulation data 
indicates that the maximum D statistic derived from random 
linear combinations of samples from a bivariate normal 
population had the desirable characteristics that: 

1) A unique maximum is obtained for each sample, 
independent of the random multipliers, provided a sufficient 
number of linear combinations are computed, and 

2) The maximum value is obtained from various linear 
combinations, at least one of which could be randomly 
selected, with high probability, in as few as 25 trials 
(selections) of multiplying vectors. In all cases investi- 
gated, including those listed in Table I, the same value of 
M was achieved over 25 linear combinations as was achieved 
in 100 linear combinations for each particular sample. 

As noted in Section II, the exact distribution of M was 
found to be intractable. Therefore, in order to study 
some of the characteristics of the distribution of M, 
another computer simulation procedure was used to produce 
a large sample of M. The simulation procedure may be 
described as follows: 

1) A sample of 25 vectors was generated from a pre- 
determined bivariate normal distribution. Since D, and 
therefore M, are dependent on sample size, it was recog- 
nized that data obtained by this simulation would pertain 
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to samples of size 25 only. However, one might expect the 
characteristics of the distribution of M to be similar for 
all sample sizes. 

2) Twenty-five randomly 'selected multiplying vectors 
were generated so that 25 linear combinations of each 
sample were produced. The maximum D over the resulting 25 
univariate samples was recorded. (From the initial simula- 
tion, it was expected that 25 linear combinations would 
produce the maximum D for any sample.) A total of 500 M 
statistics, all derived from the same underlying bivariate 
normal distribution, were produced. 

The simulation procedures were repeated for different 
parameter values of the underlying bivariate normal distri- 
bution to produce five sets of 500 statistics. Thus, each 
set of 500 values was derived from linear combinations of 
samples drawn from a different bivariate^ normal distribution. 

The results of the simulation described above are sum- 
marized in Table II. Unfortunately, it appears that there 
is not a simple statistical relationship between the distri- 
butions of the M statistics obtained with the samples drawn 
from different bivariate distributions. And, of course, 
if each different underlying distribution (of the sample) 
produces a different distribution of the M statistic, it 
would be impossible to tabulate values of all distribution 
functions of M. 
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It did not seem unlikely that a similarity or other 
relationship existed between the distributions of M 
statistics obtained from samples which were derived from 
bivariate normal distributions with identical correlation 
coefficients. Therefore, a final simulation procedure was 
repeated, using samples drawn from several non-identical 
bivariate normal distributions with constant correlation 
coefficients. The results are summarized in Table III. 

Although there is a notable similarity between the 
distributions of M statistics derived from bivariate normal 
distributions with identical correlation coefficients (p), 
the hypothesis that the distributions are identical was 
rejected by a Kolmogorov-Smirnov test at the .05 level 
of significance. This is also readily apparent for the 
case in which p = .3162. Note that the difference in the 
means of the samples of M is . 0059> whereas one standard 
deviation of the mean (computed from the sample standard 
deviation) is approximately .0022. Thus the means are 
nearly three standard deviations apart, which suggests that 
the distributions are not the same. 

An interpretation of these results and how they may be 
applied to a possible goodness-of-fit test for the bivariate 
normal distribution is discussed in Section V, Summary and 
Conclusions . 
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IV. RANDOM VARIABLE GENERATION TECHNIQUES 



In order to study the distribution of the M statistics 
described in Sections II and III, it was necessary to 
produce a large number of random variables from various 
bivariate normal distributions. There are several possible 
methods which might be used to generate the bivariate 
normal random variables on a computer. One method would 
be to generate independent normal random variables and 
perform an appropriate transformation on them which will 
produce a bivariate normal random vector. For example, to 
generate random vectors from a bivariate normal distribu- 
tion with mean zero and covariance matrix Z, where £ is 
symmetric and positive definite, one could use the follow- 
ing procedure. 

1) Generate two independent random variables, from a 
normal (0,1) distribution, so that X = (X^jX^) is bivariate 
normal (0,1) where I is the identity matrix. 

2) Perform the transformation Z = CX, where C satisfies 
CC' = £. Then Z = {Z^ t Z^) is bivariate normal (0,Z). 

In this study the bivariate normal random vectors were 
generated using a conditional distribution approach. It 
is a well known characteristic of the bivariate normal 
distribution that if X = (X^,X^) is distributed bivariate 
normal (y,E) where ^(y^,^) and 
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( a 12 °12 
' a 21 a 22 

then X 2 is distributed univariate normal (u 2 ,a 22^' 
is also well known that the conditional distribution of 
X^, given X^ = X 2 univariate normal 

t' J l +a 12 a 22 1 ^ X 2 -1J 2 ^ ,a ll -a 12 a 22 la 21-* ’ Therefore, after 
generating X^ from a univariate normal (u 2 ,a 22 ) distribu- 
tion, the conditional distribution of X^, given X 2 = , 

was computed and X 1 was then generated from that univariate 
normal distribution. 

To verify that this produces a random vector with the 
characteristics of the given bivariate normal distribution, 
consider the following: 

Let y = (0,0) = (y 13 y 2 ) 

Generate X 2 = x 2 from its marginal distribution, N(0 ,g 22 ). 
Then , 

e(x 2 ) = 0 

and 

E[(X 2 - y 2 ) 2 ] = E[(X 2 ) ? ] = a 22 

Generate V, independent of X 2 , from univariate normal 
(0,1) distribution. 

Now 

E ( V ) = 0, and 
E(V 2 ) = 1 



2k 



-1 ^ -1 

Now let X 1 = ( a ]_i _0 i2 a 22 a 21^ 2 V + a 12°22 

X 1 Is univariate normal (a 12 a 22 _1 (X 2 ) , a i±~ a i2 a 22^ a 21^ * and 
E(X 1 ) = E[ ( a 11 - a 12 a 22 la 21^ V + a 12 a 22 1 ^ X 2^ = 0 = ^ 



Similarly, the covariance between X^ and X 2 is 

Cov(X 1 ,X 2 ) = E[(X 1 -y 1 )(X 2 -y 2 )] = ECX^) 

= E{ [ (a 11 -a 12 a 22 1 a 21 )' 2 V + a 12 a 2 “ 1 (X 2 ) ]X 2 > 
= ( a 11 “ a 1 2 a 22 la 21^ E(V*X 2 )+a 12 a 22 1 E(X 2 ). 



Since V and X 2 are independent, 

E ( V . X 2 ) = E(V)E(X 2 ) = 0. 

Continuing from above, 

Cov(X-^,X 2 ) ~ 0 cr i2 a 22 ^ a 22 ^ ” ^12 — *^21 * 

The variance of X^ is 

E[(X : - y 1 ) 2 ] = E(X 2 ) 

E( t ( a ll -cr 12 a 22 °2l' )Z V + a 12 a 22 ^ X 2^ * 

= (^ 11 -o 12 CJ 22 la 21 )E(v2) + (a il- a l2 0 22 la 21 )3 " E(V) 
(a 12 a 22 1} E(X 2 } + (a 12 a 22 1)2 E(X 2 2) 
a ll“ a 12 a 22 a 21 + a 12 a 22 a 12 = a ll J 

since a 12 = a 21 . 
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Incidentally, this technique can be extended to a method 

of generation of p-component multivariate normal random 

variables, since the distribution of X^(i=l,2, . . .p) given 

any X. = X . ( j = l ,2 , . . . p , j ^ i ) , is also a normal distri- 
J J 

bution whose parameters may be computed. 

Standard computer routines were used to generate the 
univariate normal and uniform random variables required for 
the simulation procedure previously described. The routines 
are shown in the computer program under Subroutine RANDU, 
(for uniform random variables) and Subroutine GAUSS (for 
univariate normal random variables). 
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V. SUMMARY AND CONCLUSIONS 



The empirical data indicates that the distribution of 
the maximum of the D statistics (M) , derived from Kolmogorov- 
Smirnov tests of linear combinations of samples from 
bivariate normal distributions, was dependent upon the 
covariance matrix of the underlying distribution of the 
sample. Therefore it would be impossible to tabulate the 
distribution of M except for specific parameters of the 
underlying distribution. 

However, a goodness of fit test for the bivariate nor- 
mal distribution can be constructed using the M statistic. 

The test might consist of using a simulation procedure, 
similar to that used in this paper, to produce a sample 
distribution of M. This distribution of M would be derived 
from samples which are from a bivariate distribution 
identical to the proposed hypothesized bivariate distribu- 
tion. Then a critical value of M, for a test with level 
of significance a = a Q , may be established as the value at 
which the (1 - a ) percentile point of the distribution 
of M occurs. Obviously the number of linear combinations 
and the number of M statistics for development of the 
sample distribution of M must be determined by the experi- 
mentor performing the test. (Note that the size of the 
samples generated in the simulation procedure must be iden- 
tical to the size of the sample to be tested. 
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For example, suppose one wishes to test the hypothesis 

that a sample of N vectors was drawn from a population whose 

5 2 

distribution is bivariate normal (0,E) where E = ( 2 i), 

(one of the distributions for which M is tabulated in Table 
II). If N = 25 > then the distribution of the M statistics 
is shown in Table II, listed under the appropriate covari- 
ance matrix. For a = .05> the critical value of M is . 33> 
the value at which the .95 percentile point of the distri- 
bution occurs. Then 25 random linear combinations of the 
sample vectors would be computed and the 25 resulting uni- 
variate samples tested against the computed univariate 
distribution by a Kolmogorov-Smirnov test. If the maximum 
of the 25 D statistics thus obtained is greater than .33, 
the hypothesis is rejected. Otherwise the hypothesis is 
accepted . 

There are obviously many interesting aspects concerning 
this (and other) multivariate goodness of fit tests which 
should be investigated. For example the power of the 
test described in this paper, when applied to samples from 
distributions other than the bivariate normal, might be 
investigated. Also, a goodness of fit test based on a 
statistic other than M (e.g., the mean or variance of D 
obtained from linear combinations of the sample components) 
might prove to be interesting. It is, of course, desirable 
to find a "reasonable" statistic for which the distribution 
may be found and tabulated. 
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TABLE I 



Distribution (Relative Frequency) of Kolmogorov-Smirnov 
Statistics (D^) that Result from 100 Linear Combinations 
of 100 Bivariate Random Vectors from Two Bivariate Normal 
Distributions 



Value 
of K-S 
Statistic 


E = 


1 1 

UO 

VJTLO 




I= [”_ 




la lb lc 


SAMPLE 
Id le 


NUMBER 

■ 2a 2b 2c 


2d 2e 


.04 








2 




.05 


4 




21 25 


10 38 




.06 


11 




50 66 


35 41 




.07 


26 


2 


24 9 


23 19 


8 


O 

OO 


6 22 


5 


17 9 


28 


• 09 


44 20 46 




22 14 


36 6 


. 10 


9 17 30 




32 9 


18 31 


i — 1 

1 — 1 


4 






29 


10 29 


.12 


9 








20 


OO 
1 — 1 


39 








14 


.14 


11 










.15 













NOTE: 1) The critical value of the univariate Kolmogorov- 

Smirnov statistic for this sample size and 
a = . 05 > is . 136 . 

2) E - Covariance matrix of the distribution from 
which the sample was drawn. 
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TABLE II 



Distribution (Relative Frequency) of 500 Maximum Kolmogorov- 
Smirnov (M) Statistics Each of which was Derived from 25 
Linear Combinations of Samples from Various Bivariate Normal 
Distributions 



Range of 
Max. K- 
Statistic (M) 



Covariance Matrix of Distribution of 
Population from which Samples were Drawn 



"5 3" 




_ 4 2 




’4 0" 


i 


_3 5_ 




2 1 




0 1 





- 1.8 

1 



5 1 
1 2 



.06-. 07 




1 








.07-. 08 




3 








.08-. 09 




7 








.09-. 10 




22 








.10-. 11 




22 






1 


.11-. 12 


5 


40 






4 


.12-. 13 


8 


44 


5 




6 


.13-. 14 


21 


26 


7 


5 


9 


. 14- . 15 


31 


55 


18 


12 


21 


.15-. 16 


28 


34 


19 


19 


32 


.16-. 17 


38 


37 


31 


23 


43 


.17-. 18 


34 


32 


24 


30 


59 


.18-. 19 


41 


22 


41 


35 


35 


. 19-.20 


42 


31 


42 


42 


38 


.20-. 21 


39 


20 


50 


40 


31 


.21-. 22 


30 


25 


41 


36 


26 


.22-. 2 3 


27 


16 


33 


32 


31 


.23-. 24 


28 


14 


34 


49 


21 


.24-. 25 


25 


12 


31 


38 


33 


.25-. 26 


29 


8 


30 


23 


21 


.26-. 27 


20 


6 


20 


21 


20 


.27-. 28 


8 


5 


15 


21 


14 


.28-. 29 


18 


6 


15 


21 


15 


.29-. 30 


11 


4 


10 


13 


15 


. 30-.31 


7 


3 


7 


8 


8 


.31-. 32 


8 


2 


7 


9 


7 


•32-. 33 


0 


2 


7 


8 


7 


.33-. 34 


6 


1 


7 


10 


11 


•34-. 35 


2 




1 


2 


3 


.35-. 36 


3 




3 


3 


5 


.36-. 37 


2 




2 




2 


.37-. 38 










1 


Mean* 


.2033 


.1624 


.2151 


.22 37 


.2061 


Variance* 


.0028 


.0027 


. 0024 


.0027 


.0028 



MOTE: ^Sample mean and Sample variance of M statistics. 

2) Each sample size was 25 bivariate vectors. 
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TABLE III 



Distribution (Relative Frequency) of 500 Maximum Kolmogorov- 
Smirnov Statistics (M) for Samples Drawn from Various 
Bivariate Normal Distributions with Identical Correlation 
Coefficient, p 




2) Sample size - 25 vectors. 

3) E = covariance matrix or distribution of popula- 
tion from which samples were drawn. 
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TMAX IS THE MAXIMUM VALUE Uf THE K-S STATISTIC FROM EACH TEST 



NVAR IS NUMBER DF VARIABLES IN THE SAMPLE 
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